- Read in the
gapminder_clean.csv data as a tibble using read_csv.
df = as_tibble(read.csv("gapminder_clean.csv", check.names=FALSE)[,-1])
head(df)
## # A tibble: 6 x 19
## `Country Name` Year `Agriculture, value a~` `CO2 emissions~` `Domestic cred~`
## <chr> <int> <dbl> <dbl> <dbl>
## 1 Afghanistan 1962 NA 0.0738 21.3
## 2 Afghanistan 1967 NA 0.124 9.92
## 3 Afghanistan 1972 NA 0.131 18.9
## 4 Afghanistan 1977 NA 0.183 13.8
## 5 Afghanistan 1982 NA 0.166 NA
## 6 Afghanistan 1987 NA 0.276 NA
## # ... with 14 more variables:
## # `Electric power consumption (kWh per capita)` <dbl>,
## # `Energy use (kg of oil equivalent per capita)` <dbl>,
## # `Exports of goods and services (% of GDP)` <dbl>,
## # `Fertility rate, total (births per woman)` <dbl>,
## # `GDP growth (annual %)` <dbl>,
## # `Imports of goods and services (% of GDP)` <dbl>, ...
- Filter the data to include only rows where
Year is 1962 and then make a scatter plot comparing 'CO2 emissions (metric tons per capita)' and gdpPercap for the filtered data.
df_1962 = df %>% filter(Year == 1962)
p = df_1962 %>% filter(!is.na(gdpPercap) & !is.na(`CO2 emissions (metric tons per capita)`)) %>%
ggplot(aes(x = gdpPercap, y = `CO2 emissions (metric tons per capita)`, text=`Country Name`)) +
geom_point() + labs(title="CO2 emissions vs GDP") + theme_bw() + scale_y_log10() + scale_x_log10() + labs(x="GPD Per Capita")
ggplotly(p)
- On the filtered data, calculate the correlation of
'CO2 emissions (metric tons per capita)' and gdpPercap. What is the correlation and associated p value?
cor.test(df_1962$gdpPercap, df_1962$`CO2 emissions (metric tons per capita)`, use = "complete.obs")
##
## Pearson's product-moment correlation
##
## data: df_1962$gdpPercap and df_1962$`CO2 emissions (metric tons per capita)`
## t = 25.269, df = 106, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8934697 0.9489792
## sample estimates:
## cor
## 0.9260817
- On the unfiltered data, answer “In what year is the correlation between
'CO2 emissions (metric tons per capita)' and gdpPercap the strongest?” Filter the dataset to that year for the next step…
df %>%
group_by(Year) %>%
summarize(Correlation=cor(gdpPercap, `CO2 emissions (metric tons per capita)`,use = "complete.obs")) %>%
arrange(desc(Correlation))
## # A tibble: 10 x 2
## Year Correlation
## <int> <dbl>
## 1 1967 0.939
## 2 1962 0.926
## 3 1972 0.843
## 4 1982 0.817
## 5 1987 0.810
## 6 1992 0.809
## 7 1997 0.808
## 8 2002 0.801
## 9 1977 0.793
## 10 2007 0.720
- Using
plotly, create an interactive scatter plot comparing 'CO2 emissions (metric tons per capita)' and gdpPercap, where the point size is determined by pop (population) and the color is determined by the continent. You can easily convert any ggplot plot to a plotly plot using the ggplotly() command.
df_1967 = df %>% filter(Year == 1967)
p = df_1967 %>% filter(!is.na(gdpPercap) & !is.na(`CO2 emissions (metric tons per capita)`)) %>%
ggplot(aes(x = gdpPercap, y = `CO2 emissions (metric tons per capita)`, text=`Country Name`, col=continent, size=pop)) +
geom_point() + labs(title="CO2 emissions vs GDP") + theme_bw() + scale_y_log10() + scale_x_log10() + labs(x="GPD Per Capita")
ggplotly(p)